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Abstract 

We study sorting algorithms based on randomized round-robin comparisons. Specifically, we study 
Spin-the-bottle sort, where comparisons are unrestricted, and Annealing sort, where comparisons are 
restricted to a distance bounded by a temperature parameter. Both algorithms are simple, randomized, 
data-oblivious sorting algorithms, which are useful in privacy-preserving computations, but, as we show, 
Annealing sort is much more efficient. We show that there is an input permutation that causes Spin-the- 
bottle sort to require fl(n 2 logn) expected time in order to succeed, and that in 0{n 2 logn) time this 
algorithm succeeds with high probability for any input. We also show there is an implementation of 
Annealing sort that runs in 0(n log n) time and succeeds with very high probability. 



1 Introduction 

The sorting problem is classic in computer science, with well over a fifty-year history (e.g., see |3| [20)[24) 



39 42J). In this problem, we are given an array, A, of n elements taken from some total order and we 
are interested in permuting A so that the elements are listed in ordeiQ In this paper, we are interested 
in randomized sorting algorithms based on simple round-robin strategies of scanning the array A while 
performing, for each i = 1, 2, . . . , n, a compare-exchange operation between A[i] and A[s], where s is a 
randomly-chosen index not equal to i. 

In addition to its simplicity, sorting via round-robin compare-exchange operations, in this manner, is 
data-oblivious. That is, if we view compare-exchange operations as a blackbox primitive, then the sequence 
of operations performed by such a randomized sorting algorithm is independent of the input permutation. 

Any data-oblivious sorting algorithm can also be viewed as a sorting network p6}, where the elements 
in the input array are provided on n input wires and internal gates are compare-exchange operations. Ajtai, 
Komlos, and Szemeredi (AKS) 1 1] give a sorting network with 0(n log n) compare-exchange gates, but their 



method is quite complicated and has a very large constant factor, even with known improvements p2][38| . 
Leighton and Plaxton [27] and Goodrich fTTJ describe alternative randomized sorting networks that use 
O(nlogn) compare-exchange gates and sort any given input array with very high probability. None of 
these previous approaches are based on simple round-robin comparison strategies, however. 

Data-oblivious sorting algorithms are often motivated from their ability to be implemented in special- 
purpose hardware modules [24], but such algorithms also have applications in secure multi-party computa- 
tion (SMC) protocols (e.g., see |[4^ [T0|[T4l[T5||28 [|29] | ) . In such protocols, two or more parties separately hold 
different portions of a set of data values, {x%, X2, • ■ • , x n }, and are interested in computing some function, 



f(xi,X2, . . . , x n ), without revealing their respective data values (e.g., see |4j|28 40 1). Thus, the design of 
simpler data-oblivious sorting algorithms can lead to simpler SMC protocols. 



1 Since we are focusing on comparison-based algorithms here, let us assume, without loss of generality, that the elements of A 
are distinct, e.g., by a mapping A[i] — > (A[i],i) and then using lexicographic ordering for comparisons. 
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1.1 Previous Related Work 

In spite of their simplicity, we are not familiar with previous work on data-oblivious sorting algorithms 
based on round-robin random comparisons. So we review below some of the previous work on sorting that 
is related to the various properties that are of interest in this paper. 

Sorting via Random Comparisons. Biedl et al. [5] analyze a simple algorithm, Guess-sort, which itera- 
tively picks two elements in the input array at random and performs a compare-exchange for them, and they 



show that this method runs in expected time 6(n 2 logn). In addition, Gruber et al. [19| perform a more 
exact analysis of this algorithm, which they call Bozo-sort. Neither of these papers consider round-robin 
random comparisons, however. 

Quicksort. Of course, the randomized Quicksort algorithm sorts via round-robin comparisons against a 
randomly-chosen element, known as a pivot (e.g., see |11 18,36]) and this leads to a sorting algorithm 



that runs in 0(n log n) time with high probability. Even so, the set of comparisons is highly dependent on 
input values. Thus, randomized Quicksort is not a data-oblivious algorithm based on random round-robin 
compare-exchange operations. 

Shellsort. Sorting via data-oblivious round-robin random comparisons has a similar flavor to randomized 
Shellsort [ 17 ], which sorts via random matchings between various subarrays of the input array. Nevertheless, 
there are some important differences between randomized Shellsort and sorting via round-robin random 
compare-exchange operations. For instance, the analysis of randomized Shellsort requires an extensive 
postprocessing step, which we avoid in the analysis of our randomized round-robin sorting algorithms. We 
also avoid the complexity of previous analyses of deterministic variants of Shellsort (e.g., see p2][23l[33| ), 
such as that by Pratt (34}, which leads to the best known performance for deterministic Shellsort, namely, a 
worst-case running time of 0(n log 2 n). (See also the excellent survey of Sedgewick [37].) 

Sorting via Round-robin Passes. Sorting by deterministic round-robin passes is, of course, a classic ap- 
proach, as in the well-known Bubble-sort algorithm (e.g., see (TTJ[l8][36j). For instance, Dobosiewicz fl3| 
proposes sorting via various bubble-sort passes — doing a left-to-right sequence of compare-exchanges be- 



tween elements at offset-distances apart. In addition, Incerpi and Sedgewick |21 22 1 study a version of 
Shellsort that replaces the inner-loop with a round-robin "shaker" pass (see also |9][4TJ), which is a left-to- 
right bubble-sort pass followed by a right-to-left bubble-sort pass. These algorithms do not ultimately lead 
to a time performance that is 0(n log n), however. 

1.2 Our Results 

In this paper, we study two sorting algorithms based on randomized round-robin comparisons. Specifically, 
we study an algorithm we are calling "Spin-the-bottle sort," where comparisons in each round are arbitrary, 
and an algorithm we are calling "Annealing sort," where comparisons are restricted to a distance bounded 
by a temperature parameter. These algorithms are therefore similar to one another, with both being simple, 
data-oblivious sorting algorithms based on round-robin random compare-exchange operations. 

Their respective performance is quite different, however, in that we show there is an input permutation 
that causes Spin-the-bottle sort to require an expected running time that is Q(n 2 log n) in order to succeed, 
and that Spin-the-bottle sort succeeds with high probability for any input permutation in 0(n 2 log n) time. 
That is, Spin-the-bottle sort has an asymptotic expected running time that is actually worse than Bubble sort! 

Thus, it is perhaps a bit surprising that, with just a couple of minor changes, Spin-the-bottle sort can be 
transformed into Annealing sort, which is much more efficient. In particular, Annealing sort is derived by 
applying the simulated annealing p5| meta-heuristic to Spin-the-bottle sort. There are, of course, multiple 
ways to apply this meta-heuristic, but we show there is a version of Annealing sort that runs in 0(n log n) 
time and succeeds with very high probability]^] 

2 We say an algorithm succeeds with very high probability if success occurs with probability 1 — 1 /n p , for some constant p > 1 . 
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2 Spin-the-bottle Sort 

The simplest sorting algorithm we consider in this paper is Spin-the-bottle sorrl which is given in Figure [l 



while A is not sorted do 




for i = 1 to n do 




Choose s uniformly and independently at random from {1,2, . . . ,i — 1, z + 1, . . 


. ,n}. 


if (i < s and A[i] > A[s]) or (i > s and A[i] < A[s}) then 




Swap A[i] and A[s]. 




Figure 1 : Spin-the-bottle sort. 



The test for A being sorted is either done via a straightforward linear-time scan of A or by a heuristic 
based on counting the number rounds needed until it is highly likely that A is sorted. In the latter case, this 
leads to a data-oblivious sorting algorithm, that is, a sorting algorithm for which the sequence of comparison- 
exchange operations is independent of the values of the input, depending only on its size. 

2.1 A Lower Bound on the Expected Running Time of Spin-the-bottle Sort 

Our analysis of Spin-the-bottle sort is fairly straightforward and shows that this algorithm is asymptotically 
worse than almost all other published sorting algorithms. Nevertheless, let us go through some details of 
this analysis, as it provides some intuition of how improvements can be made, which in turn leads to a much 
more efficient algorithm, Annealing sort. 

Let us begin with a lower bound on the expected running time for Spin-the-bottle sort. As was done in 
the analysis of Guess-Sort [5], let us consider the input array A = (2, 1, 4, 3, . . . , n, n — 1), albeit now with 
a different argument as to why this is a difficult input instance. 

This array has N = n/2 inversions, with each element participating in exactly one inversion. During any 
scan of A, each element that has yet to have its inversion resolved has a probability of l/(n — 1) of resolving 
its inversion. Considering the sequence of compare-exchange operations that Spin-the-bottle sort performs 
until A is sorted, let us divide this sequence into maximal epochs of comparisons that do not resolve an 
inversion followed by one that does. Let X\, X%, . . . , be a set of random variables where Xi denotes 
the number of comparisons performed in epoch i, and observe that there are N — i inversions remaining in 
A after epoch i. Likewise, let Y\, Y2, . . . , Yjv De a set of random variables where Y{ denotes the number of 
comparisons performed in epoch i, but only counting each comparison done such that its element, A[i], has 
not had its inversion resolved in a previous epoch. Note that 



Xi > n 



Yi 



n-2(t-l) 



since one full round performed in epoch i involves n comparisons, of which n — 2{i — 1) are for elements 
that have yet to have their inversions resolved. 

The running time of Spin-the-bottle sort is proportional to 



N 



x = ]Tx 4 



i=i 



3 The name comes from a party game, Spin the bottle, where a group of players sit in a circle and take turns, in a round-robin 
fashion, spinning a bottle in the middle of the circle. When it is a player's turn, he or she spins the bottle and then kisses the person 
of the appropriate gender nearest to where the bottle points. 
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Each Yi is a geometric random variable with parameter p = l/(n — 1); hence, E(Yi) = n — 1. Thus, 



£(X) 



> 



> 




niV 



where H m denotes the mth Harmonic number. Thus, E(X) is S7(n 2 log n) for this input array, giving us the 
following. 

Theorem 2.1: There is an input causing Spin-the-bottle sort to have an expected running time of U(n 2 logn). 

An important lesson to take away from the proof of the above theorem is that a set of inversions between 
pairs of close-by elements in A is sufficient to cause Spin-the-bottle sort to have a relatively large expected 
running time. Intuitively, the algorithm is spending a lot of time for each element A[i] looking throughout 
the entire array for an inversion that is caused by an element right "next door" to A[i], Interestingly, this 
same intuition applies to our upper bound for the running time of Spin-the-bottle sort. 



2.2 An Upper Bound on the Running Time of Spin-the-bottle Sort 

Let us now consider an upper bound on the running time of Spin-the-bottle sort. Our analysis is based 
on characterizations involving M, the number of inversions present in A when it is given as input to the 
algorithm. Let Mj denote the number of inversions that exist in A at the beginning of round j (where a 
round involves a complete scan of A), so Mi = M. In addition, let niij denote the number of inversions 
that exist at the beginning of round j and involve A[i], and observe that 



n 
i=l 



i,3 



2Mi 



We divide the course of the algorithm into three phases, depending on the value of Mj : 

• Phase 1: Mj > Yin log n 

• Phase 2: Yin < Mj < 12nlogn 

• Phase 3: Mj < Yin. 

Theorem 2.2: Given an array Aofn elements, the three phases of Spin-the-bottle sort run in 0(n 2 log n) 
time and sort A with very high probability. 

Proof: See Appendix [A] ■ 

This, of course, is no great achievement, since there are several simple deterministic data-oblivious 
sorting algorithms that run in 0(n log 2 n) time and even Bubble sort itself is faster than Spin-the-bottle sort, 
running in 0(n 2 ) time. But the above three-phase characterization nevertheless gives us some intuition that 
leads to a more efficient sorting algorithm, which we discuss next. 
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3 Annealing Sort 

The sorting algorithm we discuss in this section is based on applying the simulated annealing |25| meta- 
heuristic to the sorting problem. Following an analogy from metallurgy, simulated annealing involves 
solving an optimization problem by a sequence of choices, such that choice j is made from among some rj 
neighbors of a current state that are confined to be within a distance bounded from above by a parameter 
Tj (according to an appropriate metric). Given the metallurgical analogy, the parameter Tj is called the 
temperature, which is gradually decreased during the algorithm according to an annealing schedule, until 
it is 0, at which point the algorithm halts. 

Let us apply this meta-heuristic to sorting, which is admittedly not an optimization problem, so some 
adaption is required. That is, let us view each round in a sorting algorithm that is similar to Spin-the-bottle 
sort as a step in a simulated annealing algorithm. Since each compare-exchange operation is chosen at 
random, let us now limit, in round j, the distance between candidate comparison elements to a parameter 
Tj, so as to implement the temperature metaphor, and let us also repeat the random choices for each element 
rj times, so as to implement a notion of neighbors of the current state under consideration. The sequence of 
Tj and rj values defines the annealing schedule for our Annealing sort. 

Formally, let us assume we are given an annealing schedule defined by the following: 

• A temperature sequence, T = (Ti, T2, ■ . . , T t ), where Tj > Tj+i, for i = 1, . . . , t — 1, and T t = 0. 

• A repetition sequence, 1Z = (r\, r2, ■ ■ ■ , r t ), for % = 1, . . . ,i. 
Given these two sequences, Annealing sort is as given in Figure [2] 



for j = 1 to t do 




for i = 1 to n — 1 do 




for k = 1 to rj do 




Let s be a random integer in the range 


[i + 1, min{n, i + Tj}]. 


HA[i] > A[s] then 




Swap A[i] and A[s] 




for i = n downto 2 do 




for k = 1 to rj do 




Let s be a random integer in the range 


[max{l, i — Tj},i — 1]. 


if A[s] > A[i] then 




Swap A[i] and A[s] 





Figure 2: Annealing sort. It takes as input an array, A, of n elements and an annealing schedule defined by 
sequences, T = (Ti, T2, . . . , Tt) and TZ = (ri, ri, ■ ■ ■ , rt). Note that if the compare-exchange operations 
are performed as a blackbox, then the algorithm is data-oblivious. 



The running time of Annealing sort is 0(nJ2j=i r i) an d i ts effectiveness depends on the annealing 
schedule, defined by T = (Ti, T2, . . . , T t ) and 1Z = (n, r2, . . . , r t ). Fortunately, there is a three-phase 
annealing schedule that causes Annealing sort to run in 0{n log n) time and succeed with very high proba- 
bility: 

• Phase 1. For this phase, let 71 = (2n, 2n, n, n, n/2, n/2, n/4, n/4 . . . , glog 6 n, q log 6 n) be the 
temperature sequence and let TZ\ = (c, c, . . . , c) be an equal-length repetition sequence (of all c's), 
where q > 1 and c > 1 are constants. 

• Phase 2. For this phase, let Ti = (q log 6 n, (q/2) log 6 n, (<z/4) log 6 n, . . . ,glogn) be the temper- 
ature sequence and let 7^2 = (r, r, . . . , r) be an equal-length repetition sequence, where q is the 
constant from Phase 1, g > 1 is a constant determined in the analysis, and r is ©(log nj log log n). 

• Phase 3. For this phase, let T3 and 7^.3 be sequences of length g log n of all l's. 
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Given the annealing schedule defined by T = (71,72,73,0) and 1Z = (7^1,7^2,^3,0), note that the 
running time of Annealing sort is 0(n log n). Let us therefore analyze its success probability. 



3.1 Analysis of Phase 1 



Our analysis for Phase 1 borrows some elements from our analysis of randomized Shellsort [17|, as this 
algorithm has a somewhat similar structure of a schedule of random choices that gradually reduce in scope. 

The Probabilistic Zero-One Principle. We begin our analysis with a probabilistic version of the zero-one 



principle (e.g., see Knuth [24]) 



Lemma 3.1 |6| |17||35| : If a randomized data-oblivious sorting algorithm sorts any array of O's and 1 's of 
size n with failure probability at most e, then it sorts any array of size n with failure probability at most 
e{n + 1). 

This lemma is clearly only of effective use for randomized data-oblivious algorithms that have fail- 
ure probabilities that are 0(n~ p ), for some constant p > 1, i.e., algorithms that succeed with very high 
probability. 

Shrinking Lemmas. As we move up and down A in a single pass, let us assume that we are considering 
the affect of this pass on an array A of zeroes and ones, reasoning about how this pass impacts the ones 
"moving up" in A. We can prove a number of useful "shrinking" lemmas for the number of ones that remain 
in various regions (i.e., subarrays) of A during this pass. (Symmetric lemmas hold for the O's with respect 
to their downward movement in A.) 

Lemma 3.2 (Sliding- Window Lemma): Let B be a subarray of A of size N, and let C be the subarray of 
A of size AN immediately after B. Suppose further there are k < 4/3 N ones in B U C, for < f3 < 1. Let 

(c) 

k{ be the number of ones in B after a single up-and-down pass of Annealing sort with temperature AN and 
repetition factor c. Then 

Pr (k[ c) > max{2f3 c N, 8elogn}) < mm{2~P CN/2 , n~ 4 }. 

Proof: For a one to remain in a given location in B it must be matched with a one in each of its c compare- 
exchange operations in B U C (and note that this is the extent of possibilities, since the temperature is AN). 
Moreover, we may pessimistically assume each such c-ary test will occur independently for each possible 
position in B with probability at most /3 C . Thus, 

E(k[ c) ) < fi c N. 

(c) 

Since k\ can, in this case, be viewed as the sum of N independent 0-1 random variables, we can apply a 



Chernoff bound (e.g., see |30}[3TJ ) to establish 

Pr(k[ c) >2f3 c N) <2-? cn '\ 

for the case when our bound on E(k[ ) is greater than 4elogn. When this bound is less than or equal to 
4e log n, we can use a Chernoff bound to establish 



Pr [k\ ' > 8elogn < 2- zel ° sn < n 
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Lemma 3.3: Suppose we are given two regions, B and C, of A, of size N and aN, respectively, for 

< a < 4, that are contained inside a subarray of A of size AN, with B to the left of C, and let k = ki + &2, 

(c) 

where k\ (resp., k2) is the number of ones in B (resp., C ). Let k\ ' be the number of ones in B after a single 
up-and-down pass of Annealing sort with temperature AN and repetition factor c. Then 

'WM 1 --*+£)'■ 

Proof: A one may possibly remain in B after a single (up) pass of Annealing sort with temperature 4iV, 
with respect to a single random choice, if it is matched with a one in C or not matched with an element in 
C at all. In a single random choice, with probability 1 — a/4, it is not matched with an element in C, and, 
if matched with an element in C, which occurs with probability a/4, the probability that it is matched with 
a one is ^/(aiV). ■ 



Lemma 3.4 (Fractional-Depletion Lemma): Given two regions, B and C, in A, of size N and aN, re- 
spectively, for < a < 4, such that B and C are contained in a subarray of A of size AN, with B to the 
left of C, let k = ki + &2, where k\ and &2 are the respective number of ones in B and C, and suppose 

(c) 

k < 4/3 N, for < £ < 1. Let k{ ' be the number of ones in B after a single up-pass of Annealing sort with 
temperature AN and repetition factor c. Then 

Pr (k[ c) >max! [ 2(l-j + py N, 8elognJ) < min{2-( 1 - Q / 4 + « C7V / 2 , n" 4 }. 



Proof: By Lemma |33| applied to this scenario, 



*<*h<%H + ^)'<H + /. 



N. 



(c) 

Since k\ can be viewed as the sum of k\ independent 0-1 random variables, we can apply a standard 



Chernoff bound (e.g., see 1 30 31]) to establish 



Pr (k^ > 2 (l - | + p) Nj< 2 -(W 4 + /W, 

for the case when our bound on E(k[ c ^) is greater than 4elogn. When this bound is less than or equal to 
4e log n, we can use a Chernoff bound to establish 

in ^ -4 ■ 



Pr > 8elognJ < T ie ' % n < n 

Lemma 3.5 (Startup Lemma): Given two regions, B and C, in A, of size N and aN, respectively, for 
< a < A, contained in a subarray of A of size AN, with B to the left of C, let k = k\ + &2> where k\ and 

(c) 

&2 are the respective number of ones in B and C, and suppose k < 4/3 N, for < /3 < 1. Let k\ be the 
number of ones in B after one up-pass of Annealing sort with temperature AN and repetition factor c. Then, 
for any constant A > such that 1 — a/4 + /3 — A<1 — e, for some constant < e < 1, there is a constant 
c > 1 such that k± < XN, with very high probability, provided N is f2(log n). 



Proof: By Lemma 3.3 so long as k\ > XN, then 

(0) < /_q 4/W-AIVy 



E ^ * { l --A + ^N—) N 

< (l-f + /3-A) N 

< (l-e) c N. 
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Of course, we are done as soon as k\ < XN, and note that, for c > log]/n_ e ) A/2, we have E(k[ ) < 
XN/2. Thus, by a Chernoff bound, for such a constant c, 

Pr (k[ c) > XN) = Pr (k{ c) > 2AiV/2) < 2- XN '\ 

The proof follows then, the fact that N is Q(log n). ■ 

Having proven the essential properties for the compare-exchange passes done in each round of Phase 1 
of Annealing sort, let us now turn to the actual analysis of Phase 1 . 

Bounding Dirtiness after each Iteration. In the 2d-th iteration of Phase 1 , imagine that we partition the 
array A into 2 d regions, Aq, Ax,..., A 2 <i_ 1 , each of size n/2 d . Moreover, every two iterations with the same 
temperature splits a region from the previous iteration into two equal-sized halves. Thus, the algorithm can 
be visualized in terms of a complete binary tree, B, with n leaves. The root of B corresponds to a region 
consisting of the entire array A and each lealQof B corresponds to an individual cell, ai, in A, of size 1. Each 
internal node v of B at depth d corresponds with a region, Aj, created in the 2<i-th iteration of the algorithm, 
and the children of v are associated with the two regions that A{ is split into during iteration 2(d + 1). 

The desired output, of course, is to have each leaf value, Gtj = 0, for i < n — k, and a, = 1, otherwise. 
We therefore refer to the transition from cell n — k — 1 to cell n — k on the last level of B as the crossover 
point. We refer to any leaf-level region to the left of the crossover point as a low region and any leaf-level 
region to the right of the crossover point as a high region. We say that a region, Ai, corresponding to an 
internal node v of B, is a low region if all of u's descendents are associated with low regions. Likewise, a 
region, A4, corresponding to an internal node v of , is a high region if all of u's descendents are associated 
with high regions. Thus, we desire that low regions eventually consist of only zeroes and high regions 
eventually consist of only ones. A region that is neither high nor low is mixed, since it is an ancestor of both 
low and high regions. Note that there are no mixed leaf-level regions, however. 

Also note that, since Phase 1 is data-oblivious, the algorithm doesn't take any different behavior de- 
pending on whether is a region is high, low, or mixed. Nevertheless, given the shrinking lemmas presented 
above, we can reason about the actions of our algorithm on different regions in terms of any one of these 
pairs. 

With each high (resp., low) region, Ai, define the dirtiness of A{ to be the number of zeroes (resp., ones) 
that are present in Ai, that is, values of the wrong type for A4. With each region, Ai, we associate a dirtiness 
bound, 5{Ai), which is a desired upper bound on the dirtiness of Ai. For each region, A4, at depth d in B, 
let j be the number of regions from Ai to the crossover point or mixed region on that level. That is, if Ai 
is next to the mixed region, then j = 1, and if Ai is next to a region next to the mixed region, then j = 2, 
and so on. In general, if Ai is a low leaf-level region, then j = n — k — i — 1, and if Ai is a high leaf-level 
region, then j = j — n + k. We define the desired dirtiness bound, 5(Ai), of Ai as follows: 

• If j > 2, then 

• If j = 1, then 

• If Ai is a mixed region, then 

8{Ai) = \Ai\. 

4 This is a slight exaggeration, of course, since we terminate Phase 1 when regions have size 0(log 6 n). 
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Thus, every mixed region trivially satisfies its desired dirtiness bound. 

Because of our need for a high probability bound, we will guarantee that each region Ai satisfies its 
desired dirtiness bound, w.v.h.p., only if S(Ai) > 8elogn. If 5{Ai) < 8elogn, then we say A4 is an 
extreme region, for, during our algorithm, this condition implies that Ai is relatively far from the crossover 
point. We will show that the total dirtiness of all extreme regions is 0(log 3 n) w.v.h.p. This motivates our 
termination of Phase 1 when the temperature is 0(log 6 n). 

Lemma 3.6: Suppose Ai is a low (resp., high) region and A is the cumulative dirtiness of all regions to the 
left (right) of Ai. Then any compare-exchange pass over A can increase the dirtiness of Ai by at most A. 

Proof: If Ai is a low (resp., high) region, then its dirtiness is measured by the number of ones (resp., zeroes) 
it contains. During any compare-exchange pass, ones can only move right, exchanging themselves with 
zeroes, and zeroes can only move left, exchanging themselves with ones. Thus, the only ones that can move 
into a low region are those to the left of it and the only zeroes that can move into a high region are those to 
the right of it. ■ 

The inductive claim we show in Appendix [B] holds with very high probability is the following. 

Claim 3.7: After iteration d, for each region Ai, the dirtiness of Ai is at most 5{AA, provided Ai is not 
extreme. The total dirtiness of all extreme regions is at most 8ed log 2 n. 



3.2 Analysis of Phase 2 

Claim [3^7] is the essential condition we need to hold at the start of Phase 2. In this section, we analyze the 
degree to which Phase 2 increases the sortedness of the array A further from this point. 

At the beginning of Phase 2, the total dirtiness of all extreme regions is at most 8e log 3 re, and the size 
of each such region is g log 6 n, for g = 64e 2 . Without loss of generality, let us consider a one in an extreme 
low region. The probability that such a one fails to be compared with a zero to its right in a round of Phase 2 
is at most 1/iV 1 / 2 , provided g is large enough. Thus, with r = h log re/ log log re, the probability such a one 
fails to be compared with a after r random comparisons at distance N is at most 

1 

jy(/i/ 2 ) log n/ log logra 
1 

(log n) ( fe / 2 ) lo s n l lo § lo § n 
1 

n h/2 ' 

since N > log n during Phase 2. Thus, with very high probability, there are no dirty extreme regions after 
one round of Phase 2. 

Consider next a non-extreme low region that is not mixed. By Claim [3?7j the dirtiness of such a region, 
and all regions to its left, is, with very high probability, at most 7N/W. Thus, 

*c*h < (i-jj)'* 



/ ^ \ h log 71/ log log n 



< 
< 



Therefore, by a Chernoff bound, for d and n large enough, 

( eiV )dlog7V 



Pr /V r) > d\c,P N\ < {G ' 

V 1 ^ Ulu & jv y - (g(20/3)dlogn/loglogn)dlogAr 
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1 

— gd log n 
1 

< "J- 



Note that in the next round after this, such a region will become completely clean, w.v.h.p., since its 
dirtiness is below 1/N 1 / 2 w.v.h.p. 



In addition, by Lemma 3.5 since iV is fi(logn) throughout Phase 2, then, w.v.h.p, the dirtiness of 
regions separate from a mixed region is at most N/6. Thus, the above analysis applies to them as well, once 
they are separate from a mixed region. 

Therefore, by the end of Phase 2, w.v.h.p., the only dirty regions are either mixed or within distance 2 of 
a mixed region. In other words, the total dirtiness of the array A at the end of Phase 2 is 0(log n). 



3.3 Analysis of Phase 3 

Each round of Phase 3 is guaranteed to decrease the dirtiness of A by at least 1 so long as A is not completely 
clean. This property is similar to the reason why Bubble sort works. Namely, using the zero-one principle, 
note that the leftmost one in A will always move right until it encounters another one. Thus, a single up-pass 
in A eliminates the leftmost one having a zero somewhere to its right. Likewise, a single down-pass in A 
eliminates the rightmost zero having a one somewhere to its left. Thus, since the total dirtiness of A is 
O(logn) w.v.h.p., Phase 3 will completely sort A w.v.h.p. 
Therefore, we have the following. 

Theorem 3.8: Given an array Aofn elements, there is an annealing schedule that cause the three phases 
of Annealing sort to run in 0(n log n) time and leave A sorted with very high probability. 



4 Conclusion 

We have given two related data-oblivious sorting algorithms based on iterated passes of round-robin random 
comparisons. The first, Spin-the-bottle sort requires an expected {l(n 2 logn) time to sort some inputs and 
in 0(n 2 log n) time it will sort any given input sequence with very high probability. The second, Annealing 
sort, on the other hand, can be designed to run in 0(n log n) time and sort with very high probability. 
Some interesting open problems include the following. 

• Our analysis is, in many ways, overly pessimistic, in order to show that Annealing sort succeeds with 
very high probability. Is there a simpler and shorter annealing sequence that causes Annealing sort to 
run in 0(n log n) time and sort with very high probability? 

• Both Spin-the-bottle sort and Annealing sort are highly sequential. Is there a simple]^] randomized 
sorting network with depth O(logn) and size 0(n logn) that sorts any given input sequence with 
very high probability? 

• Throughout this paper, we have assumed that compare-exchange operations always return the correct 
answer. But there are some scenarios when one would want to be tolerant of faulty compare-exchange 



operations (e.g., see [2, 8 16]). Is there a version of Annealing sort that runs in 0(n log n) time and 
sorts with high probability even if comparisons return a faulty answer uniformly at random with 
probability strictly less than 1/2? 



5 Leighton and Plaxton |27 | describe a randomized sorting network that sorts with very high probability, which is simpler than 
the AKS sorting network [1], but is still somewhat complicated. So the open problem would be to design a sorting network 
construction that is clearly simpler than the construction of Leighton and Plaxton. 



10 



Acknowledgments 

This research was supported in part by the National Science Foundation under grants 0724806, 0713046, 
and 0847968, and by the Office of Naval Research under MURI grant N00014-08-1-1015. 

References 

[1] M. Ajtai, J. Komlos, and E. Szemeredi. Sorting in clog n parallel steps. Combinatorica, 3:1-19, 
1983. 

[2] S. Assaf and E. Upfal. Fault tolerant sorting networks. SIAM J. Discrete Math. , 4(4):472-480, 1991. 

[3] K. E. Batcher. Sorting networks and their applications. In Proc. 1968 Spring Joint Computer Conf., 
pages 307-314, Reston, VA, 1968. AFIPS Press. 

[4] A. Ben-David, N. Nisan, and B. Pinkas. FairplayMP: A system for secure multi-party computation. 

In CCS '08: Proceedings of the 15th ACM conference on Computer and communications security, 
pages 257-266, New York, NY, USA, 2008. ACM. 

[5] T. Biedl, T. Chan, E. D. Demaine, R. Fleischer, M. Golin, J. A. King, and J. I. Munro. Fun-sort-or the 
chaos of unordered binary search. Discrete Appl. Math., 144(3):23 1-236, 2004. 

[6] D. T. Blackston and A. Ranade. Snakesort: A family of simple optimal randomized sorting 

algorithms. In ICPP '93: Proceedings of the 1993 International Conference on Parallel Processing, 
pages 201-204, Washington, DC, USA, 1993. IEEE Computer Society. 

[7] A. Boneh and M. Hofri. The coupon-collector problem revisited — a survey of engineering problems 
and computational methods. Communications in Statistics — Stochastic Models, 13(1):39— 66, 1997. 

[8] M. Braverman and E. Mossel. Noisy sorting without resampling. In SODA '08: Proceedings of the 
19th ACM-SIAM Symposium on Discrete algorithms, pages 268-276, Philadelphia, PA, USA, 2008. 
Society for Industrial and Applied Mathematics. 

[9] B. Brejova. Analyzing variants of Shellsort. Information Processing Letters, 79(5):223 - 227, 2001. 

[10] R. Canetti, Y. Lindell, R. Ostrovsky, and A. Sahai. Universally composable two-party and multi-party 
secure computation. In STOC '02: Proceedings of the thiry -fourth annual ACM symposium on 
Theory of computing, pages 494-503, New York, NY, USA, 2002. ACM. 

[11] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, 
Cambridge, MA, 2nd edition, 2001. 

[12] R. Cypher. A lower bound on the size of Shellsort sorting networks. SIAM J. Comput., 22(1):62-71, 
1993. 

[13] W. Dobosiewicz. An efficient variation of bubble sort. Inf. Process. Lett, ll(l):5-6, 1980. 

[14] W. Du and M. J. Atallah. Secure multi-party computation problems and their applications: a review 
and open problems. In NSPW '01: Proceedings of the 2001 workshop on New security paradigms, 
pages 13-22, New York, NY, USA, 2001. ACM. 

[15] W. Du and Z. Zhan. A practical approach to solve secure multi-party computation problems. In 
NSPW '02: Proceedings of the 2002 workshop on New security paradigms, pages 127-135, New 
York, NY, USA, 2002. ACM. 

[16] U. Feige, P. Raghavan, D. Peleg, and E. Upfal. Computing with noisy information. SIAM J. Comput., 
23(5): 1001-1018, 1994. 

[17] M. T. Goodrich. Randomized Shellsort: A simple oblivious sorting algorithm. In Proceedings of the 
ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1-16. SIAM, 2010. 



11 



[18] M. T. Goodrich and R. Tamassia. Algorithm Design: Foundations, Analysis, and Internet Examples. 
John Wiley & Sons, New York, NY, 2002. 

[19] H. Gruber, M. Holzer, and O. Ruepp. Sorting the slow way: an analysis of perversely awful 

randomized sorting algorithms. In FUN' 07: Proceedings of the 4th international conference on Fun 
with algorithms, pages 183-197, Berlin, Heidelberg, 2007. Springer- Verlag. 

[20] C. A. R. Hoare. Quicksort. Comput. J., 5(1): 10-15, 1962. 

[21] J. Incerpi and R. Sedgewick. Improved upper bounds on Shellsort. /. Comput. Syst. Sci., 
31(2):210-224, 1985. 

[22] J. Incerpi and R. Sedgewick. Practical variations of Shellsort. Inf. Process. Lett., 26(1):37^13, 1987. 

[23] T. Jiang, M. Li, and P. Vitanyi. A lower bound on the average-case complexity of Shellsort. /. ACM, 
47(5):905-91 1,2000. 

[24] D. E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming. 
Addison-Wesley, Reading, MA, 1973. 

[25] P. J. M. Laarhoven and E. H. L. Aarts, editors. Simulated annealing: theory and applications. Kluwer 
Academic Publishers, Norwell, MA, USA, 1987. 

[26] F. T. Leighton. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. 
Morgan-Kaufmann, San Mateo, CA, 1992. 

[27] T. Leighton and C. G. Plaxton. Hypercubic sorting networks. SIAM J. Comput, 27(1): 1-47, 1998. 

[28] D. Malkhi, N. Nisan, B. Pinkas, and Y. Sella. Fairplay — a secure two-party computation system. In 
SSYM'04: Proceedings of the 13th conference on USENIX Security Symposium, pages 20-20, 
Berkeley, CA, USA, 2004. USENIX Association. 

[29] U. Maurer. Secure multi-party computation made simple. Discrete Appl. Math., 154(2):370-381, 
2006. 

[30] M. Mitzenmacher and E. Upfal. Probability and Computing: Randomized Algorithms and 
Probabilistic Analysis. Cambridge University Press, New York, NY, USA, 2005. 

[31] R. Motwani and P. Raghavan. Randomized Algorithms . Cambridge University Press, New York, NY, 
1995. 

[32] M. Paterson. Improved sorting networks with 0(log N) depth. Algorithmica, 5(l):75-92, 1990. 

[33] C. G. Plaxton and T. Suel. Lower bounds for Shellsort. J. Algorithms, 23(2):221-240, 1997. 

[34] V. R. Pratt. Shellsort and sorting networks. PhD thesis, Stanford University, Stanford, CA, USA, 
1972. 

[35] S. Rajasekaran and S. Sen. PDM sorting algorithms that take a small number of passes. In IPDPS 
'05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium 
(IPDPS '05) - Papers, page 10, Washington, DC, USA, 2005. IEEE Computer Society. 

[36] R. Sedgewick. Algorithms in C++. Addison-Wesley, Reading, MA, 1992. 

[37] R. Sedgewick. Analysis of Shellsort and related algorithms. In ESA '96: Proceedings of the Fourth 
Annual European Symposium on Algorithms, pages 1-11, London, UK, 1996. Springer- Verlag. 

[38] J. Seiferas. Sorting networks of logarithmic depth, further simplified. Algorithmica, 53(3):374—384, 
2009. 

[39] D. L. Shell. A high-speed sorting procedure. Commun. ACM, 2(7):30-32, 1959. 

[40] G. Wang, T. Luo, M. T. Goodrich, W. Du, and Z. Zhu. Bureaucratic protocols for secure two-party 
sorting, selection, and permuting. In 5th ACM Symposium on Information, Computer and 
Communications Security (ASIACCS), pages 226-237. ACM, 2010. 



12 



[41] M. A. Weiss and R. Sedgewick. Bad cases for shaker-sort. Information Processing Letters, 28(3): 133 
- 136, 1988. 

[42] J. Williams. Algorithm 232: Heasort. Commun. ACM, 7:347-348, 1964. 



13 



A Proving the Correctness of Spin-the-bottle Sort 



In this appendix, we prove Theorem |2.2[ which states that, given an array A of n elements, the three phases 
of Spin-the-bottle sort run in 0(n 2 log n) time and sort A with very high probability. 

The proof is based on showing that we can achieve each of the milestones marking each phase in 
0{n 2 log n) time or better. 

Phase 1. Let Xj be a random variable that equals the number of inversions resolved in round j of Phase 1 , 
and let X^j denote an indicator random variable that is 1 iff we perform a comparison in iteration (round) j 
of the algorithm between A[i] and an element that caused an inversion with A[i] at the beginning of round 
j. Thus, 



^ ~ 2 

since each inversion involves two elements of A. Each of the X^j 's are independent. Furthermore, 

E ( X i,j) = H*> 

n — 1 

where m^j denotes the number of inversions that exist at the beginning of round j and involve A[i]. There- 
fore, 

n 

mi 



E[X $ ) > (1/2)E^3T = M ^/( n " 1 )' 



where Mj is the number of inversions in A that exist at the beginning of round j. Thus, by a well-known 
Chernoff bound, 

/ -1/2 

Pv(Xj < M j /2(n - 1)) < l^yji ) 

< 2 -M J /3(n-l) 

< n~ 4 , 

since we are in Phase 1. So we may assume with probability at least 1 — c/n 3 that the following recurrence 
relation holds during Phase 1, for all 1 < j < cn, for any constant c > 1: 

Therefore, with probability at least 1 — 4/n 3 , there are at most 4n rounds during Phase 1 of Spin-the-bottle 
sort, since M\ = M < n 2 and Mj > Yin log n, for all j during Phase 1. That is, with very high probability, 
Phase 1 runs in 0{n 2 ) time. 

Phase 2. For this phase, let Xj and X%j denote random variables defined as in our analysis of Phase 1, 
with the index j reset to 1 for Phase 2. In this case, 

E{Xj) > Mj/(n - 1) > 12. 

Thus, by a similar Chernoff bound used for analyzing Phase 1 , 

Prpfj < 6) < Pr(Xj <Mj/2(n-l)) 

< 2 -M i /3(n-l) 

< 2'\ 
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since we are in Phase 2. That is, with probability 1/16 we resolve fewer than 6 inversions in round j of 
Phase 2. Call round j a failure in this case, and call it a success if it resolves at least 6 inversions. Let Yj 
be an indicator random variable that is 1 iff we resolve fewer than 6 inversions in round j of Phase 2, or, 
if j is larger than the number of rounds in Phase 2, then let Yj be an independent random variable that is 1 
with probability 1/16. Thus, the number of failure rounds in the first at most 4nlogn rounds of Phase 2 is 
at most 

4n log n 

i=l 

Note that E(Y) = (l/4)nlogn. Thus, by a standard Chernoff bound, 

Pr(T>2nlogn) = Pr(Y > 8(l/4)nlogn) 



=7' 



(l/4)ralogra 



< 



<^ 2~ 2nlogn 

= n- 2n . 

Note, in addition, that there can be, in total, at most 2n log n successful rounds in Phase 2. Thus, with very 
high probability, there are only 0(n log n) rounds in Phase 2. That is, with very high probability, Phase 2 
runs in 0(n 2 log n) time. 

Phase 3. The analysis for this phase is similar to that for the coupon collector's problem (e.g., see (7J). 
At the start of this phase, there are fewer than 12n inversions that remain in A. Note that, for any such 
inversion, x, the probability that \ is resolved in a round of Phase 3 is at leasj^] 1/n. Let Z r x be the event 
that x is not resolved after r rounds of Phase 3. Thus, 

Pr(Zp < (l - 1)' < e"^. 

Let R denote the number of rounds needed to resolve all the inversions in Phase 3. Then, for c > 2, 



Pr(ii> cnlnn) < Pr^(J^ nlogn ^ 
< £Pr(z 



7cn log n 



< 



X 

12 

c-l ' 



Thus, with very high probability, R is 0(n log n); hence, with very high probability, Phase 3 runs in 
0(n 2 log n) time. This completes the proof. 



6 In fact, the probability that \ ls resolved in a round of Phase 3 is equal to 2/(n— 1) — l/(n— l) 2 , since each inversion has 
two chances of being resolved during a round. 
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B Proof of the Inductive Claim for Phase 1 of Annealing Sort 



In this appendix, we prove Claim [3/7} which states that, after iteration d, for each region Ai, the dirtiness 
of Ai is at most 5(Ai), provided Ai is not extreme, and that the total dirtiness of all extreme regions is at 
most 8ed log 2 n. As mentioned above, this analysis for Phase 1 of Annealing sort borrows from our analysis 
of randomized Shellsort fTTJ , as there is a similar structure to our inductive argument even though the fine 
details are quite different. 

Let us begin at the first round, which we are viewing in terms of two regions, A\ and A2, of size 
N = n/2 each. Suppose that k < n — k, where k is the number of ones, so that A\ is a low region and 
A2 is either a high region (i.e., if k = n — k) or A2 is mixed (the case when k > n — k is symmetric). Let 
k\ (resp., ^2) denote the number of ones in A\ (resp., A%), so k = k\ + k2- By the Startup Lemma (3.5 I, 
the dirtiness of A± will be at most n/12, with very high probability, since in this case (using the notation of 
that lemma and viewing A as existing inside a larger array of size 2n), a = 1, j3 < 1/4, and A = 1/6, so 
1 — a/4 + (3 — A < 1 — 1/6. Note that this satisfies the desired dirtiness of A%, since 5{Ai) = n/10 in 
this case. A similar argument applies to A2 if it is a high region, and if A2 is mixed, it trivially satisfies its 
desired dirtiness bound. Also, assuming n is large enough, there are no extreme regions (if n is so small that 
Ai is extreme, we can immediately switch to Phase 2). The next round of Annealing sort (with temperature 
2n) can only improve the dirtiness in A. Thus, we satisfy the base case of our inductive argument — the 
dirtiness bounds for the two children of the root of B are satisfied with (very) high probability, and similar 
arguments prove the inductive claim for iterations 3 and 4, for N = n/2 2 and temperature n, and iterations 
5 and 6 for N = n/2 3 and temperature n/2. 

Let us now consider a general inductive step. Let us assume that, with very high probability, we have 
satisfied Claim 3.7 for the regions on level d > 3 and let us now consider the transition to level d+1, which 
occurs in iterations 2d + 1 and 2d + 2. In addition, we terminate this line of reasoning when the region size, 
n/2 d , becomes less than 64e 2 log 6 n. 

Extreme Regions. Let us begin with the bound for the dirtiness of extreme regions at depth d+1, con- 
sidering the effect of iteration 2d + 1. Note that, by Lemma 3.6 regions that were extreme after iteration 
2d will be split into regions in iteration 2d + 1 that contribute no new amounts of dirtiness to pre-existing 
extreme regions. That is, extreme regions get split into extreme regions. Thus, the new dirtiness for extreme 
regions can come only from regions that were not extreme on level d of B that are now splitting into extreme 
regions on level d + 1, which we call freshly extreme regions. Suppose, then, that Ai is such a region, say, 
with a parent, A p , which is j regions from the mixed region on level d. The n the desired dirtiness bound 
of Ai's parent region, A p , is 5(A P ) = n/2 d+ i+ 3 > 8elogn, by Claim BJj since A p is not extreme. A p 
has (low-region) children, Ai and A4+1, that have desired dirtiness bounds of 5(Ai) = n/2 d+1+2 i +A or 
5{Ai) = n/2 d+1+2 i +3 and of S(A i+1 ) = n/2 d+l+2 i +3 or 8(A i+1 ) = n/2 d+1+2 i+ 2 , depending on whether 
the mixed region on level d+1 has an odd or even index. Moreover, Ai (and possibly Ai + {) is freshly 
extreme, so n/2 d+1+2j+4 < 8elogn, which implies that j > (log n — d — log log n — 10)/2. Neverthe- 
less, note also that there are O(logn) new regions on this level that are just now becoming extreme, since 
n/2 d > 64e 2 log 6 n and n/2 d+J+3 > 8elogn implies j < logn — d. So let us consider the two freshly 
extreme regions, Ai and Ai + i, in turn, and how a pass of Annealing sort effects them (for after that they will 
collectively satisfy the extreme-region part of Claim [3?7] >. 

Region Af. Consider the worst case for 5(Ai), namely, that 5{Ai) = n/2 d+1+2j+4 . Since A{ is 
a left c hild of A p , Ai could get at most n/2 d+ i +3 + 8edlog 2 n ones from regions left of Ai, by 
In addition, Ai and Ai + \ could inherit at most S(A P ) = n/2 d+ ^ +3 ones from A p . 



Lemma 



3.6 



Thus, if we let N denote the size of A4, i.e., N = n/2 d+l , then Ai and Ai + i together have at most 
N/2i +1 + 3N 1 / 2 < N/2i ones, since we stop Phase 1 when < 64e 2 log 6 n. In addition, assuming 
j > 4, regions Ai + 2 and ^+3 may inherit at most n/2 d+ i +2 ones from their parent and region Aj + 4 
may inherit at most n/2 d+ ^ +l ones from its parent. Therefore, by the Sliding-Window Lemma (3.2 1, 
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with P = 5/2 J+3 < 1/2 J , the following condition holds with probability at least 1 — cn 4 , 

k[ c) < max{2/3 c iV, 8elogn}, 

(c) 

where k\ is the number of one left in A{ after an up-pass of Annealing sort with temperature AN and 

tc) 

repetition factor c. Note that, if k\ < 8elogn, then we have satisfied the desired dirtiness for Ai. 
Alternatively, so long as c > 4, and j > 4, then w.v.h.p., 

k[ c) < 2{3 C N 
n 

< 



2<i+jc 
Ti 

< 2d+ i +2j -+4 <8elogn = 5(A l ). 



Region Ai+x". Consider the worst case for 5(A{+i), namely 5(A{ + i) = n/2 +1+2j+3 . Since, in 
this case, Ai + \ is a right child of A p , Ai + \ could get at most n/2 d+ i + ^ + 8edlog 2 n ones from 



regions left of Ai+i, by Lemma 3.6 plus Ai+\ could inherit at most S(A P ) = n/2 rf+J+3 ones from 
A p itself. In addition, since j > 3, Ai + 2 and Aj + 3 could inherit at most n/2 d+ ^ +2 ones from their 
parent, and -Aj+4 and Ai + $ could inherit at most n/2 d+ i +1 ones from their parent. Thus, if we 
let N denote the size of Aj+i, i.e., N = n/2 d+1 , then Aj + i through Ai + § together have at most 
N/2i +l + 3N 1 / 2 + N/2i +l + N/2 j < 4A^/2^' ones, since we stop Phase 1 when N < 64e 2 log 6 n, 



and j > 4. By the Sliding-Window Lemma (3.2 1, applied with j3 = 1/2 J , the following condition 
holds with probability at least 1 — cn~ 4 , 

k{ c) < max{2/3 c iV, 8elogn}, 

(c) 

where k\ is the number of ones left in Ai + \ after a pass of Annealing sort with repetition factor c 

(c) 

and temperature 4A r . Note that, if k\ < 8e log n, then we have satisfied the desired dirtiness bound 
for Ai + \. Alternatively, so long as c > 4, and j > 4, then w.v.h.p., 

k^ ] < 2p c N 



n 



< 



2<i+jc 
Tl 

< 2 d+i+2j+4 <8elogn = ^ +1 ) 



Therefore, if a low region Ai or Ai+i becomes freshly extreme in iteration 2d + 1, then, w.v.h.p., its 
dirtiness is at most 8e log n. Since there are at most log n freshly extreme regions created in iteration 2d+ 1, 
this implies that the total dirtiness of all extreme low regions in iteration 2c? + 1 is at most 8e(d + 1) log 2 n, 



w.v.h.p., after the right-moving pass of Phase 1, by Claim 3.7 Likewise, by symmetry, a similar claim 



applies to the high regions after the left-moving pass of Phase 1. Moreover, by Lemma 3.6 these extreme 



regions will continue to satisfy Claim 3.7 after this. 

Non-extreme Regions not too Close to the Crossover Point. Let us now consider non-extreme regions 
on level d+ 1 that are at least two regions away from the crossover point on level d+1. Consider, wlog, a low 
region, A p , on level d, which is j regions from the crossover point on level d, with A p having (low-region) 
children, Ai and A i+i , that have desired dirtiness bounds of 5(Aj) = n/2 d+1+2j+4 or 5(Ai) = n/2 d+1+2j+s 
and of S(Ai+i) = n/2 d+1+2: * +3 or 5(Ai+x) = n/2 d+1+2j+2 , depending on whether the mixed region on 
level d + I has an odd or even index. By Lemma |3.6| if we can show w.v.h.p. that the dirtiness of each 



such Ai (resp., Aj+i) is at most 5(Ai)/3 (resp., 5(Ai + i)/3), after the up-and-down pass of Phase 1, then 
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no matter how many more ones come into A4 or Ai + \ from the left during the rest of iteration 2d + 1 (and 
2d + 2), they will satisfy their desired dirtiness bounds. 

Let us consider the different region types (always taking the most difficult choice for each desired dirti- 
ness in order to avoid additional cases): 

• Type 1: 8{Ai) = n/2 d+1+2j+4 , with j > 4. Since A4 is a left child of A p , in this case, A4 could 



get at most n/2 d+J ' +3 + 8edlog 2 n ones from regions left of Ai, by Lemma 3.6 In addition, A^ and 
Ai + \ could inherit at most S(A P ) = n/2" +J+3 ones from A p . Thus, if we let N denote the size of Ai, 
i.e., N = n/2 d+1 , then A4 and A i+1 together have at most N/2 j+1 + 3iV 1/2 < N/2 j ones, since we 
stop Phase 1 when N < 64e 2 log 6 n. In addition, Ai + 2 and Ai + % inherit at most n/2 d+ i +2 ones from 
their parent. Likewise, A{ + 4 inherits at most n/2 d+ i +1 ones from its parent. Thus, Ai through Ai + 4 
inherit at most N/2? + N/2 3+1 + N/2 3 < N/2^ 2 ones. Thus, we can apply the Sliding-Window 



Lemma (3.2 1, with f3 = 1/2 3 , so that, the following condition holds with probability at least 1 — n 



provided c > 4 and j > 4: 



k[ c) < 2fi c N 



< 



n 



2<f+i-h?'c-i 



ft 

- 3 . 2 d+l+2j+4 = 5 (^)/ 3 ' 

(c) 

where k\ is the number of ones left in Ai after a pass of Annealing sort with repetition factor c. 
Type 2: 5(A i+1 ) = n/2 d+1+2j+3 , with j > 4. Since A i+ \ is a right child of A p , in this case, A i+1 



could get at most n/2 rf+J+3 + 8edlog 2 n ones from regions left of Ai + i, by Lemma 3.6 plus Ai + \ 
could inherit at most 8(A p ) = n/2 d+ i +3 ones from A p . In addition, since j > 2, Aj + 2 and Aj + 3 
could inherit at most n/2 d+ i +2 ones from their parent. Thus, if we let N denote the size of Ai+i, i.e., 
N = n/2 d+1 , then A i+X , A i+2 , and A i+3 together have at most N/2^ + 3N 1 / 2 < N/2^ 1 ones, since 
we stop Phase 1 when N < 64e 2 log 6 n. In addition, Ai + 4 and ^+5 may inherit n/2 d+J,+1 ones from 
their parent. Thus, Ai + \ through Aj + 5 may receive A^/2^ -1 + N/2^ < N/2^ 2 ones. Therefore, with 



(3 = 1/2 3 , we may apply the Sliding-Window Lemma (3.2) to show that, with probability at least 

1 — n~ 4 , for j > 4 and c > 4, 



< 



n 



71 

^ 3 . 2 d+l+2j+3 = ^+l)A 

fcl 

where A:} is the number of ones left in Aj+i after a pass of Annealing sort with repetition factor c. 
Type 3: S(Ai) = n/2 d+1+2j+4 , with j = 3. Since Ai is a left child of A p , in this case, Aj could 



get at most n/2 d+: > +3 + 8ecilog 2 n ones from regions left of Ai, by Lemma 3.6 In addition, A{ and 
Aj+i could inherit at most 8(A P ) = n/2 d+ i +3 ones from A p . Thus, if we let N denote the size of 
Ai, i.e., N = n/2 d+1 , then Aj and together have at most 7V/2J+ 1 + SiV 1 / 2 < N/V = N/2 3 
ones, since we stop Phase 1 when N < 64e 2 log 6 n. In addition, Ai + 2 and Ai + % inherit at most 
n/2 d+: ' +2 = N/2 A ones from their parent. Finally, Ai + 4 inherits at most n/(5 • 2 d ) = 2N/5 ones 
from its parent. Thus, A { through A i+4 inherit at most N/2 3 + N/2 A + 2N/5 < 5N/2 3 = 5N/2 3 
ones. Thus, we can apply the Sliding-Window Lemma (3.2 1, with f3 = 5/2 J+2 , so that, the following 



condition holds with probability at least 1 — n , for c > 5 and j = 3: 

k[ c) < 2f3 c N 
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5 c n 

- 2 d +0'+ 2 ) c 

Ti 

- 3 . 2 d+l+2j+4 = <K^)A 

where is the number of ones left in A* after a pass of Annealing sort with repetition factor c and 
temperature AN. 

Type 4: <5(,4 i+ i) = n/2 d+1+2 ^'+ 3 , with j = 3. Since ,4 l+ i is a right child of A p , in this case, A i+1 
could get at most n/2 rf+J+3 + 8ecilog 2 n ones from regions left of Ai+i, by Lemma 3.6 plus Ai + \ 



could inherit at most S(A P ) = n/2 d+J+3 ones from ^4 p . In addition, since j > 2, Ai + 2 and ^+3 
could inherit at most n/2 d+: > +2 ones from their parent. Thus, if we let N denote the size of A{ + x, i.e., 
N = n/2 d+1 , then A i+1 , A i+2 , and A i+3 together have at most N/2? + 3N 1 / 2 < N/2^ 1 ones, since 
we stop Phase 1 when N < 64e 2 log 6 n. In addition, Ai+i and Ai + § may inherit n/(5 • 2 d ) ones from 
their parent. Thus, ^4j+i through Aj + 5 may receive N/2^ 1 + 2iV/5 < (2/3) N ones. Therefore, 
with p < 1/6, we may apply the Sliding-Window Lemma ( 3.2 1 to show that, with probability at least 
1 — n -4 , for j = 3 and c > 6, 

k[ c) < 2p c N 



< 



n 



3 c 2 d+i 



TI 

< 3 ; 2 d+i+2j+3 = <H^+i)A 

where is the number of ones left in after a pass of Annealing sort with repetition factor c. 
Type 5: S(Ai) = n/2 d+1+2j+4 , with j = 2. Since A4 is a left child of Ap, in this case, A { could get 



at most n/2 d+j+3 + 8ed log 2 n ones from regions left of A{, by Lemma |3.6| In addition, A{ and Ai+i 
could inherit at most S(A p ) = n/2 d+: > +3 ones from A p . Thus, if we let N denote the size of A4, i.e., 
N = n/2 d+1 , then Ai and A i+1 together have at most N/2 j+1 + 3N 1 / 2 < N/2 j = N/2 2 ones, since 
we stop Phase 1 when N < 64e 2 log 6 n. In addition, Ai + 2 and Ai + % inherit at most 2N/5 ones from 
their parent. Thus, we can apply the Fractional-Depletion Lemma ( 3.4 >, with a = 3 and {} < 1/6, so 
that the following condition holds with probability at least 1 — n~ 4 , for c > 9 and j = 2: 



*f> < 2(^ + -j N 

Ti 

- 3 . 2 d+l+2j+4 = 5 (^)/ 3 ' 

where is the number of ones left in A{ after a pass of Annealing sort with repetition factor c and 
temperature AN. 

Type 6: 5(A i+ i) = n/2 d+1+2j+3 , with j = 2. Since is a right child of A p , in this case, 
Ai + i could get at most n/2 d+: > +3 + 8edlog 2 n ones from regions left of A; L +\, by Lemma 3.6 plus 
Ai+i could inherit at most 5{A P ) = n/2 d+ ^ +3 ones from A p . In addition, since j = 2, Aj + 2 and 
ylj + 3 could inherit at most 2N/5 ones from their parent, where we let N denote the size of Ai+\, 
i.e., N = n/2 d+1 . Thus, A i+X , A i+2 , and A i+3 together have at most N/2 j+1 + 3iV 1/2 + 2N/5 < 
(2/3) N ones, since we stop Phase 1 when N < 64e 2 log 6 n. Thus, Ai + \ through Ai+5 may receive 
JV/2 J + 2N/h < (2/3)N ones. Therefore, with a = 3 and (3 < 1/6, we may apply the Fractional- 
Depletion Lemma to show that, with probability at least 1 — n~ 4 , for c > 9 and j = 2: 



1 1 



1 1 



77- 

- 3 . 2 d+l+2j+3 = <H^)A 
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(c) 

where k\ is the number of ones left in A, + i after a pass of Annealing sort with repetition factor c 
and temperature 4iV. 

Type 7: 5 (At) = n/2 d+1+2j+4 , with j = 1. Since Aj is a left child of A p , in this case, Aj could 



get at most n/2 d+J+2 + 8edlog 2 n ones from regions left of Aj, by Lemma 3.6 plus Aj and Aj + i 
could inherit at most 5(A P ) = n/(5 • 2 d ) ones from A p . Thus, if we let N denote the size of A{, i.e., 
N = n/2 d+1 , then A; and A m together have at most N/2^ +1 + 2A/5 + 3A 1 / 2 < 7N/W ones, since 



we stop Phase 1 when N < 64e log n. Thus, we may apply the Fractional-Depletion Lemma (3.4 1, 
with a = 1 and /3 = 0.175, the following condition holds with probability at least 1 — n -4 , for a 
suitably-chosen constant c, with j = 1, 

/4 c) < 2(0.925) C A 

77- 

- 3 . 2 d+l+2j+4 = 5 (^)/ 3 ' 

fcl 

where fc} is the number of ones left in Ai after a pass of Annealing sort with repetition factor c. 

Thus, Ai and A4+1 satisfy their respective desired dirtiness bounds w.v.h.p., provided they are at least two 
regions from the mixed region or crossover point. 

Regions near the Crossover Point. Consider now regions near the crossover point. That is, each region 
with a parent that is mixed, bordering the crossover point, or next to a region that either contains or borders 
the crossover point. Let us focus specifically on the case when there is a mixed region on levels d and d + 1, 
as it is the most difficult of these scenarios. 

So, having dealt with all the other regions, which have their desired dirtiness satisfied after a single up- 
and-down pass of Phase 1, with temperature AN, we are left with four regions near the crossover point, each 
of size N = n/2 d+1 , which we will refer to as Ax, Ai, A3, and A4. One of A2 or A3 is mixed — without loss 
of generality, let us assume A3 is mixed. At this point in the algorithm, we perform an other up-and-down 
pass with temperature 4A^. So, let us consider how this pass impacts the dirtiness of these four regions. Note 
that, by the results of the previous pass with temperature AN (which were proved above), we have at this 
point pushed to these four regions all but at most n/2 d+7 + 8e(d+l) log 2 n of the ones and all but at most 
n/2 d+6 + 8e{d + 1) log 2 n of the zeroes. Moreover, these bounds will continue to hold (and could even 
improve) as we perform the second up-and-down pass with temperature AN. Thus, at the beginning of this 
second pass, we know that the four regions hold between 2N - N/32 - 3A 1 / 2 and 3N + A/64 + 3N 1 / 2 



zeroes and between N - iV/64 - 3N 1/2 and 2N + A^/32 + 3A^ 1/2 ones, where N = n/2 d+1 > 64e 2 log 6 n. 



Let us therefore consider the impact of the second pass with temperature 4A^ for each of these four regions: 
• A\ : this region is compared to A2, A3, and A4, during the up-pass. Thus, we may apply the Fractional- 



Depletion Lemma ( 3.4 1 with a = 3. Note, in addition, that, for N large enough, since there are at most 
2N + N/32 + 3N 1 / 2 < 2.2N ones in all of these four regions, we may apply the Fractional-Depletion 
Lemma with f3 = 0.55. Thus, the following condition holds with probability at least 1 — n~ 4 , for a 
suitably-chosen constant c, 

k[ c) < 2(0.8) C N 
N 

< H = w, 

where k[ c ^ is the number of ones left in A\ after a pass of Annealing sort with repetition factor c and 
temperature 4A^. 

A2: each element of this region is compared to elements in A3 and A4 in the up-pass and A\ in the 
down-pass. Note, however, that even if A\ receives zeroes in the up-pass, there are still at most 
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2N + N/32 + 3N 1 / 2 < 2.2N ones in A 2 U A 3 U A*. Thus, even under this worst-case scenario 



(from A 2 's perspective), we may apply the Startup Lemma (3.5 ), with a = 2, (5 = 0.55, and A = 1/6, 
which implies that 

l-a/4 + /3-A< 1-1/10, 
i.e., we can take e = 1/10 and show that, there is a constant c such that, w.v.h.p., 

k[ c) < j<S(A 2 ), 

(c) 

where k\ is the number of ones left in A 2 after an up-pass of Annealing sort with repetition factor c 
and temperature 47V. 

^3: by assumption, A3 is mixed, so it automatically satisfies its desired dirtiness bound. 

A4. this region is compared to A\, A 2 , and ^3, in the down-pass. Note further that, w.v.h.p., there 
are at most 3iV + iV/64 + 3iV 1//2 < 3.2iV zeroes in these four regions, for large enough N. Thus, 



we may apply a symmetric version of the Startup Lemma (3.5 1, with a = 3, (3 = 0.8, and A = 1/6, 
which implies 

I- a/A + p - \ < 1-1/10, 
i.e., we can take e = 1/10 and show that, there is a constant c such that, w.v.h.p., 

k[ c) < j<S(A 4 ). 

where is the number of ones left in A4 after a down-pass of Annealing sort with repetition factor 
c and temperature AN. 

Thus, after the two up-and-down passes of Annealing sort with temperature AN, we will have satisfied 



Claim 3.7 w.v.h.p. In particular, we have proved that each region satisfies Claim 3.7 after iteration 2(d + 1) 
of Phase 1 of Annealing sort with a failure probability of at most 0(n~ 4 ), for each region. Thus, since there 
are 0(n) such regions per iteration, this implies any iteration will fail with probability at most 0(n~ 3 ). 
Therefore, since there are O(logn) iterations, and we lose only an 0(n) factor in our failure probability 
when we apply the probabilistic zero-one principle (Lemma |3.1| ), when we complete the first phase of 
Annealing sort, w.v.h.p., at the beginning of Phase 2, the total dirtiness of all extreme regions is at most 
8e log 3 n, and the size of each such region is g log 6 n, for g = 64e 2 . 
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